# A 480 mW 2.6 GS/s 10b Time-Interleaved ADC With 48.5 dB SNDR up to Nyquist in 65 nm CMOS

Kostas Doris, Member, IEEE, Erwin Janssen, Member, IEEE, Claudio Nani, Athon Zanikopoulos, Member, IEEE, and Gerard van der Weide

Abstract—This paper presents a 64-times interleaved 2.6 GS/s 10b successive-approximation-register (SAR) ADC in 65 nm CMOS. The ADC combines interleaving hierarchy with an open-loop buffer array operated in feedforward-sampling and feedback-SAR mode. The sampling front-end consists of four interleaved T/Hs at 650 MS/s that are optimized for timing accuracy and sampling linearity, while the back-end consists of four ADC arrays, each consisting of 16 10b current-mode non-binary SAR ADCs. The interleaving hierarchy allows for many ADCs to be used per T/H and eliminates distortion stemming from open loop buffers interfacing between the front-end and back-end. Startup on-chip calibration deals with offset and gain mismatches as well as DAC linearity. Measurements show that the prototype ADC achieves an SNDR of 48.5 dB and a THD of less than -58 dB at Nyquist with an input signal of 1.4  $V_{\rm pp-diff}$ . An estimated sampling clock skew spread of 400 fs is achieved by careful design and layout. Up to 4 GHz an SNR of more than 49 dB has been measured, enabled by the less than 110 fs rms clock jitter. The ADC consumes 480 mW from 1.2/1.3/1.6 V supplies and occupies an area of 5.1 mm<sup>2</sup>.

Index Terms—Analog-to-digital converter, calibration, clock jitter, direct sampling receiver, Nyquist converter, successive approximation register, time-interleaving, timing skew, track-and-hold.

#### I. INTRODUCTION

R ECENT trends in cable reception for data and video ask for the simultaneous reception of more than 16 channels (6 MHz/8 MHz wide), arbitrarily located in the 48–1002 MHz TV band, for next generation home-gateways that approach 1 Gb/s at the customer premises equipment. A low power ADC that can digitize the entire TV band with sufficient resolution and integrate with baseband system-on-chip (SoC) in advanced CMOS, enables the direct sampling receiver architecture [1] shown in Fig. 1. Such a solution alleviates several challenges stemming from the integration of multiple single-channel or wideband [2] zero- or low-IF receivers.

An evolution in the field of time interleaved ADCs can be observed in recent literature [3]–[8]. For example, in [5] interleaving of 160 6b successive-approximation registers (SARs) enabled 40 GS/s at 1.5 W for optical communications. The potential of SARs for higher resolutions at GS/s rate can be observed in [6], further underpinned by the continuing progress in

Manuscript received April 30, 2011; revised June 25, 2011; accepted July 29, 2011. Date of publication October 03, 2011; date of current version November 23, 2011. This paper was approved by Guest Editor Yiannos Manoli.

The authors are with NXP Semiconductors, 5656AE Eindhoven, The Netherlands (e-mail: kostas.doris@nxp.com).

Digital Object Identifier 10.1109/JSSC.2011.2164961

power efficiency for single SAR ADCs at 8–14b level [9]–[13]. However, the complexity of interleaving ADCs, namely input, clock and reference signals distribution to all track/hold (T/H) and ADCs, timing, bandwidth, offset, gain and linearity mismatches in combination with high frequency linearity limitations and clock jitter, has restricted SAR ADCs to insufficient performance levels for the concept of Fig. 1. Alternative approaches claim a too-high power consumption and system complexity [14].

In this work we present a 10b 2.6 GS/s ADC [15], which interleaves 64 reduced-radix SAR ADCs employing current steering DACs. Interleaving complexity is handled by introducing interleaved hierarchy with a feedforward–feedback interface architecture. Timing and bandwidth matching is achieved intrinsically, while gain and offset mismatches and DAC linearity are calibrated on-chip.

The remainder of this paper is structured as follows. The main issues of today's interleaved ADC architectures are described in Section II. Section III introduces the architecture proposed in this work while Sections IV and V describe the interleaved T/H and its interface with the SAR ADCs, respectively. Section VI focuses on the design of the SAR ADC. Measurement results are presented in Section VII. Finally, conclusions are drawn in Section VIII.

# II. TIME INTERLEAVED CONVERTERS: LIMITATIONS AND ARCHITECTURES

A composite broadband cable signal, consisting of as many as 150 modulated carriers covering 1 GHz of bandwidth, imposes several requirements to a time interleaved ADC. The total integrated thermal noise power should be 55 dB below the power of a full scale sinusoidal carrier to allow sufficient SNR for each modulated carrier, while the sampling bandwidth should be in excess of 1 GHz to avoid limiting the carrier SNR close to Nyquist due to signal attenuation. Offset mismatch errors, residing in fixed spectral locations and being independent of the signal power, should be sufficiently low to avoid creating spurious in channel locations. In addition, the large number of carriers imposes strict linearity requirements and necessitates THD levels as low as  $-60 \, dB$  in order to keep intermodulation effects sufficiently low. Timing mismatches [16], [17] also result in significant interference tones in the whole band and have to be limited below 1 ps levels. Last but not least, a fully loaded spectrum results in significant broadband noise due to clock jitter, where the noise power is strongly dependent on the spectral power allocation at higher frequencies [18], [19], resulting in jitter specifications below 0.5 ps rms.



Fig. 1. Direct sampling receiver for cable applications.



Fig. 2. Time Interleaved ADC architectures: (a) Hierarchical, e.g., [4], and (b) Pipelined without T/H hierarchy, e.g., [6], [20].

Simultaneously, the use of parallelism exacerbates the aforementioned signal deficiencies. In the case of massively parallelized low resolution ADCs for GS/s rates, the ADC array input interconnect [4], [5], [20] dominates the input load and becomes challenging to drive at high frequencies with low noise, good signal integrity and small phase differences among interleaved ADC units. As resolution increases, the challenge to drive the sampling capacitor progressively takes over that of the interconnect array [4], [6], [21]–[24]. This is due to thermal noise requirements (kT/C), which are further constrained by limited supply voltages in advanced CMOS, and hence, dictate a significant reduction of the number of interleaved channels from hundreds of units to just a couple. Bandwidth mismatch [16], [17] becomes also important, e.g., because of T/H bandwidth limitations imposed by large sampling capacitors [22].

Modern CMOS technologies offer sub-ps timing accuracy for a single stage, as the steepness of clock signals between locally placed blocks increases due to reduced interconnect and gate load, thanks to the reduced size of devices and shorter distances between blocks. However, this advantage evaporates with large T/H arrays that require clock buffering and introduce slow clock transitions due to interconnect and driver bandwidth limitations; both aggregate the impact of device mismatch on timing skew, while IR-drops in supplies and bias lines, gradients, etc. add significant systematic timing skew. As a result, achieving reliably 0.1–1 ps level timing accuracy is very challenging, both with, or without calibrations [4]–[6], [20], [22], [25].

Power supply and substrate noise pick-up in clock buffering chains, caused by digital activity in embedded applications, becomes also important, making the generation of a low jitter ADC clock a difficult task. Additionally, the clocking system is often itself a noise source for the ADC, due to the large peak currents it generates.

Today's interleaved ADC architectures manage these issues with the use of T/H hierarchy [4], pipelining and double-sampling techniques [26], which all attempt to reduce the number of T/Hs. T/H hierarchy allows to connect many ADC units to a single T/H, as shown conceptually in Fig. 2(a). This approach makes it simpler to drive signal and clock at higher frequencies, but at the cost of shifting the problem of noise and nonlinearity in the T/H, due to the requirement for an additional sampling phase and T/H buffering. Pipelining offers another form of hierarchy, hiding the input interconnect behind the first stage of pipeline as shown in Fig. 2(b) and increasing the unit ADC throughput, hence requiring fewer T/Hs. This is restricted to the maximum number of stages in the pipeline, further limited by the practice to resolve many bits in the first stage for power efficiency reasons.

Recently reported GS/s 6b pipelined interleaved ADCs use only a few ADC units without T/H hierarchy [25], [27] and leverage speed through the use of open loop amplifiers and calibrations However, at higher resolutions, the residue amplification with open loop amplifiers becomes challenging, while more interleaved ADC units are required because the pipeline alone does not suffice to increase total throughput, resulting in large ADC array that introduces power and complexity limitations [20]. Moreover, T/H hierarchy for pipelined ADCs is not recommended beyond four units [28] due to its high noise, linearity and power consumption penalty. Alternative approaches have been used in [26], [29], [30] but suffer from bandwidth limitations. A SAR ADC without T/H hierarchy as in [3], [7] benefits from the absence of resampling, but is prone to the same aforementioned interleaving limitations, when extended to higher resolutions and GS/s operation. The 10b ADC in [6] reported a two-stage pipelined SAR ADC to double the throughput of the SAR ADC unit, at the cost of an



Fig. 3. Proposed ADC architecture.

additional resampling phase and an open loop buffer between each T/H and SAR ADC.

A SAR ADC is more suitable to combine with T/H hierarchy offering the equivalent of one sampling operation less than the pipelined ADC (typically, the secondary stages of the pipeline are designed to contribute equal to the kT/C noise of the input stage), and without the need to implement accurate residue amplification, in exchange for the impact of loop noise. For example, in [4] T/H hierarchy with open loop buffers, as shown in Fig. 2(a), allowed to manage input and clock interconnect of 160 6b ADCs. However, it still required a 6 dB power splitter to drive the input load due to the limited number of ADC units that could be attached to a single T/H. This approach has not been used so far for higher resolutions.

A SAR hierarchical interleaved architecture has been selected in this work because of its higher potential for parallelism. The limitations of the existing interleaved architectures have been addressed in this work with the architecture described in Sections III–VIII.

# III. ARCHITECTURE

The proposed hierarchical interleaved ADC architecture is shown in Fig. 3. In the ADC, the input signal is distributed to the four T/Hs without need for an on-chip buffer. Each T/H drives 16 reduced radix SAR ADCs to a combined total of 64 ADC units, arranged in four Quarter ADC arrays (QADC). Each T/H drives its QADC array with a feedforward–feedback multiplexed open loop buffer interface, which will be detailed in Section V. An on-chip digital calibration engine per QADC removes gain and offset mismatches, corrects the DAC nonlinearity within each SAR ADC, and realizes the non-binary to binary mapping on the output data.

The operation of the ADC is described next, using the timing diagram shown in Fig. 4. The ADC is clocked from a single  $f_s = 2.6$  GHz clock. Each T/H operates at 650 MHz with a duty

cycle of 50%, meaning that tracking and hold use two periods  $T_s=1/f_s$  each. The SAR ADCs of each QADC re-sample the data provided by the T/H and operate according to the SAR algorithm, with an internal clock cycle of 650 MHz. A SAR ADC unit outputs 10b data after 12 cycles, one for resampling and 11 for quantization. Only 12 SAR ADCs per QADC are needed for interleaving at 2.6 GHz, the rest are implemented for redundancy. The total of 16 SAR ADCs can be used in any pre-selected or random order during operation.

A clock engine receives and conditions the main 2.6 GHz clock and provides it to the four T/Hs that are operated directly from it. In addition, it generates four 650 MHz clocks with phases 0°, 90°, 180°, 270°, respectively, that are used to clock each QADC and the calibration logic. These clocks have programmable delays, to maximize the sampling window between the front-end and back-end sampling operations. A data combiner synchronizes the four data streams from the QADCs prior sending them to the ADC output (Fig. 3).

The partition of the ADC in two domains allows to optimize T/H and SAR ADC arrays separately from each other. The use of only four T/Hs reduces drastically the input and clock interconnect, enables wide input bandwidth and allows use of a simple clocking architecture without the need for timing correction, which is explained in more detailed in Section IV. The feedforward-feedback multiplexed open loop buffer interface partitions the total interconnect stemming from the large ADC array, such that only a fraction of it is directly connected to the T/H. This enhances sampling linearity and allows for many ADCs to connect to a single T/H. In addition, this interface further eliminates the linearity requirements of the open loop buffers, enabling large signal swing, low capacitor values, and consequently, low power consumption and high speed. The SAR ADC architecture splits the sampling and DAC functions, allowing the sampling node to be dimensioned for thermal noise, and not for matching. This is an advantage, given the targeted 10b objective. In combination with the current steering DAC,



Fig. 4. Timing diagram of the ADC.



Fig. 5. Clocking hierarchy and architecture.

this further enables the interface mentioned earlier, and simplifies the reference distribution along the large ADC arrays, making the calibrations for gain and offset mismatches a mere addition of currents.

### IV. TRACK/HOLD FRONT-END

The circuit topology of the bootstrapped switch and associated clocking reflect the requirement to achieve wide bandwidth and high linearity, while assuring timing accuracy intrinsically. The employed clocking architecture is shown in Fig. 5. Clock distribution and processing are realized with current-mode logic (CML) in order to avoid generating clock spurious that could affect the ADC performance, and to enhance robustness against substrate noise stemming from digital activity in an SoC application. CML clocking is also very resilient against temperature, process and supply (PVT) variations. The common 2.6 GHz low swing differential clock signal provided at the ADC input is conditioned by two buffers, and subsequently distributed to the four T/Hs via a shielded H-tree structure. A short clock path has been

achieved thanks to the use of four T/Hs and the small height of the ADC array. This resulted in a clock interconnect of less than 200 fF, despite extensive shielding.

The local clocking unit and sampling switch are shown in Fig. 6. The local clock unit receives the main differential clock and translates it, with the help of control signals and the bootstrap circuit, to a CMOS pulse, which defines accurately the track and hold moments for the sampling switch, similar to the method proposed in [6]. The switch is differential and uses a cross-coupled path to cancel signal feedthrough. The voltage references for calibration are supplied directly to the sampling capacitor to avoid additional linearity and bandwidth margins for a multiplexer at the input as it is done in other work [21].

Switch bootstrapping is required in order to limit the impact of bandwidth mismatch at the sampling node, to achieve a sampling linearity of more than 60 dB, and to reduce the impedance modulation of the T/H input as a function of the signal amplitude. The commonly used bootstrapping method [31] is shown in Fig. 7. In the tracking phase the pre-charged capacitor  $C_B$ 



Fig. 6. Circuit implementation of the local clocking block and sampling switch.



Fig. 7. Bootstrapping method according to [31]: (a) circuit topology, and (b) timing diagram.

bootstraps the switch MS1 to Vdd. In the hold phase, the switch gate is grounded. In this topology, the rise time of  $V_{G1}$  is determined by the time constant of the bootstrap loop, given by the series of  $C_B$  and  $C_{G1}+C_P$  and the values of on-resistors  $R_{\mathrm{MN},on}$  and  $R_{\mathrm{MP},on}$  of transistors MP and MN, respectively, in the bootstrapping loop. Capacitor  $C_B$  is the bootstrap capacitor,  $C_{G1}$  is the switch gate load and  $C_P$  the total parasitic capacitor from devices and interconnect [31]. The time constant can be expressed as

$$\tau = (R_{\text{MN},on} + R_{\text{MP},on}) \frac{C_B(C_{G1} + C_P)}{C_B + C_{G1} + C_P}$$

$$= (R_{\text{MN},on} + R_{\text{MP},on})(C_{G1} + C_P),$$

$$C_B \gg C_{G1} + C_P.$$
(1)

At the initial phase of the track transient, before  $V_{G1}$  assumes a sufficiently large value to fully turn on MN, MP is saturated and dictates the current flowing from  $C_B$  to charge node G1. The slowly rising  $V_{G1}$  defines  $R_{on}$  of MN, which influences how quickly  $V_{G1}$  rises to track the signal. Increasing MP does not make the transient fast enough and introduces more parasitics. This mechanism in combination with a large  $C_B$ , which

is needed to avoid loss of overdrive at the beginning of the bootstrapping phase, and the associated parasitics (a) limits the switch bandwidth and modulates the T/H input impedance for a significant portion of the tracking phase, and (b) limits the bootstrap bandwidth causing loss of sampling linearity at high frequencies.

The circuit shown in Fig. 8 improves on these limitations by combining the merits of a simple switch with a bootstrapped one to make the rising edge independent of  $C_B$ . The operation is divided in three phases. In phase A, node G1 is charged directly to Vdd via M3 as it is normally done with a non-bootstrapped switch. In phase B, the bootstrapping capacitor is connected to the gate of MS1, similarly to the conventional method, activating the bootstrapping loop M4– $C_B$ –M7. Transistor M7 is not controlled with the gate signal G1 as in [31] to reduce parasitic loading at node G1 further and enable a faster pull-down for the hold phase. M4 is controlled similarly to [31] whereas the hold phase C is activated with the pull-down path to ground defined by M2.

With this circuit, the track rising transient is steep and the bootstrapping phase starts always from Vdd without loss of overdrive that would otherwise necessitate the use of a large  $C_B$ . This improves the recovery of the sampling capacitors from



Fig. 8. Proposed bootstrap circuit and timing diagram.



Fig. 9. Hierarchical interfacing between T/H and SAR ADCs.

the previously stored values and allows choosing  $C_B = C_{G1}$  to double the bootstrap bandwidth in phase B compared to the conventional approach, at the cost of a remaining signal dependent gate overdrive. This choice makes also sure that M3 and M5 do not compress the signal. In contrast to [31] these transistors limit the maximum swing allowed by this topology compressing the bootstrapped signal when it exceeds Vdd by approximately a threshold voltage. With a threshold voltage of around 350 mV, an input signal of  $1.4 \, \mathrm{V_{pp-diff}}$ , and attenuation defined by  $C_B = C_{G1}$ , this mechanism is not limiting for this design. Reduced parasitics and a small switch size enable also a steep track to hold transition, improving timing accuracy and sampling linearity.

Maintaining a constant impedance at the T/H input is very important in order to simplify the interfacing with an off-chip signal source and avoid the need for additional on-chip buffering that adds noise, nonlinearity and power consumption. The clocking scheme shown in Fig. 4 with two T/Hs always connected to the input in combination with on-chip input termination reduce significantly dynamic impedance variations compared to, for example, an ADC with a single T/H turned on and off every  $T_s/2$ . The remaining dynamic impedance variation when one T/H goes in the hold phase while another enters

the tracking phase is further reduced with the fast pre-charge to Vdd. The combination of the aforementioned techniques makes it possible to achieve small impedance variations at the T/H input in order to drive it with an LNA-VGA as a 50  $\Omega$  load without the need for an on-chip buffer.

All nodes in the T/H and clock distribution have been optimized with the methods presented in [19], i.e., optimizing the slope of each node versus the mismatch, considering between RC and slew rate limited nodes, providing identical power supply levels to all local clock units, balancing clock current loops, co-optimizing the aspect ratio of interconnect and driving impedance of the clock buffer, and finally extracting and back-annotating all blocks and interconnect.

#### V. FEEDFORWARD-FEEDBACK INTERFACE

An open loop interface between a T/H and an array of SAR ADCs is shown in Fig. 9. The source follower topology is widely accepted for low to medium resolution high speed ADCs [6], [8], [27], [32]–[36] but several limitations make it less suitable for beyond 60 dB THD levels and a high degree of T/H hierarchy. The compressive buffer characteristic and the output resistance modulation due to short channels of MOS devices in modern CMOS processes result in strong nonlinearity. Despite



Fig. 10. Buffer partition and introduction of a demultiplexer.

the use of replica well-bias [8], [35], bootstrapping [6], [37], and cascoding [33] these mechanisms necessitate low signal swing at the cost of SNR loss. The large nonlinear gate load of the buffer presents another issue, which is a well-known limit to Flash ADCs. The Miller capacitor between input and output of the buffer, in combination with bandwidth and slew rate limitations of the buffer which has to handle the full input bandwidth during the tracking phase (Fig. 9(b)), introduces another error mechanism that limits sampling linearity at high frequencies [6]. When the T/H enters the hold phase the input node of the buffer is floating and the buffer still charges its output, introducing charge sharing between the two, which corrupts the sampled signal. Dealing with this mechanism requires large buffer bandwidth and slew rate capability. In [6] the buffer load is connected only during the hold phase to alleviate charge-sharing at the cost of less time for settling in the hold phase, i.e., the buffer still has to charge the output load from its previously held value to the new one.

T/H hierarchy with many ADCs aggregates these error mechanisms. These limitations were alleviated by changing the circuit topology in two steps, as described in Figs. 10 and 11. In the first step (Fig. 10), the buffer is initially partitioned in 16 smaller ones, each one driving the interconnect load that corresponds to one SAR ADC only. A demultiplexer (demux) based on plain NMOS switches and placed between the main sampling capacitor and each buffer connects only one buffer to the input sampling node. This partitions the corresponding total interconnect of the whole SAR ADC array in 16 parts and reduces significantly the buffer load connected to the main sampling capacitor, making it equivalent to an approach without hierarchy. In [26], [29], [30] hierarchy with global passive sampling techniques has been used to eliminate timing errors. However, in these approaches the master switch, the demux, the interconnect to each ADC and the sampling switch and capacitor of the unit ADC are placed in series. This restricts T/H bandwidth, introduces bandwidth mismatches and limits T/H hierarchy.

The linearity limitations of the open loop buffer are removed as a second step, by introducing the buffer in the SAR loop, as shown in Fig. 11. In charge-redistribution SAR ADCs, e.g., [3], the sampling and DAC functions are merged together into one array of capacitors, which makes this approach prone to linearity limitations of the interface. In this work, the DAC and sampling functions are split from each other. The introduction of a multiplexer allows placing the buffer in the SAR loop, as shown in Fig. 11. In the feed-forward tracking path, the T/H held signal  $V_{\rm in}$  passes directly onto the resampling capacitor, where it is

sampled as  $V_{\rm in,comp}$  after being distorted by the buffer nonlinearity. In the SAR loop the buffer is subsequently placed in feedback into the loop, translating the otherwise linear DAC output  $V_{\rm dac}$  to a compressed equivalent  $V_{\rm dac,comp}.$  Since the SAR operation is based on zero crossing detection, and both input and DAC signals are equally distorted, the difference signal will still result in the correct decision. Moreover, since only settled values are required from the buffer, the output impedance modulation of the buffer by the signal is not of concern anymore.

#### VI. SAR ADC IMPLEMENTATION

Fig. 12 shows the architecture of the complete interface and the SAR ADC. The buffer is realized with a source follower and is combined with the demux into one unit. The bias level is determined by the necessity to complete slewing before the hold transition of the T/H. To reduce the power consumption of the 64 buffers, two T/Hs are tracking simultaneously, without limiting the input bandwidth of the ADC. In addition, the SAR sampling capacitor is dimensioned for thermal noise, further reduced by the large signal swing. Finally, the interconnect load between a buffer and an ADC is kept low using a small ADC array height. The bandwidth of the buffer was designed to satisfy settling at the 650 MHz rate and to avoid influencing the sampling linearity below 60 dB up to the Nyquist frequency.

The choice of the SAR architecture is strongly affected by the usage of the feed-forward feed-back structure; it is pseudo-differential and consists of a sampling capacitor, a comparator preceded by a fully differential preamplifier, a SAR digital controller and a current steering main DAC. The SAR loop between the main DAC and the sampling capacitor is closed via the front-end multiplexer and the interfacing buffer. Two calibration DACs are used to tune the offset and the gain of the ADC. The non-binary successive approximation algorithm has been implemented in order to achieve high conversion speed with relaxed settling requirements. Moreover, the redundancy of the non-binary algorithm has been exploited to reduce the main DAC area using a DAC linearity calibration technique similar to [8]. Due to the reduced radix, the complete SAR conversion process requires 12 clock cycles (one for tracking, 11 for conversion) at 650 MHz to achieve a nominal resolution of 10b. While tracking, the input sampled by the front-end is passed via the interfacing buffer to the SAR sampling capacitor where it is sampled using a bottom plate sampling scheme. During the conversion mode the main DAC is connected via the buffer to the sampling capacitor that acts as subtraction point between the DAC and the sampled signal. This difference is then amplified by a preamplifier before being latched.

# A. Preamplifier and Comparator

In this design a static preamplifier is used to increase the difference between the input signal and the main DAC before latching the decision with the comparator. The role of the preamplifier is threefold, a) it amplifies the difference signal, b) it mitigates the impact of comparator kickback and noise, and c) it limits the noise bandwidth at the input terminals of the comparator. The latter role is very important because at high speed the noise generated at every conversion step by the main DAC and by the preamplifier (sometimes called loop



Fig. 11. Introduction of the buffer in the SAR loop.



Fig. 12. Interface and SAR ADC implementation.

noise) becomes a significant contributor to the SNR. The total power of the loop noise referred to the preamplifier input can be expressed as  $V_n^2 = \gamma(2kT/C_{\rm cmp}G^2)$ , where  $C_{\rm cmp}$  is the total capacitance at the comparator input, G is the preamplifier DC gain and  $\gamma(>1)$  is an excess noise factor that models the noise generated by the DAC and the input stages of the preamplifier.

The schematic of the preamplifier and the comparator is shown in Fig. 12. It consists of three open loop stages that drive a dynamic regenerative comparator [38]. The preamplifier differential pair topology has been chosen for its simplicity and performance stability. Thanks to calibration, device sizing is not constrained by any matching requirement. This is beneficial because the charge sharing between the preamplifier input capacitance and the sampling capacitor attenuates the sampled signal, hence increasing the impact of the preamplifier noise on the total SNR. Each stage of the preamplifier has been optimized according to different criteria. The first amplifier has been designed for low noise, moderate gain and small input capacitance. The second stage acts as low gain/high bandwidth interface and provides low output impedance in order to drive effectively the third stage. Finally, the last amplifier has been optimized for limiting the noise bandwidth while providing high DC gain. This stage also suppresses the kickback of the regenerative comparator. The offset correction is performed by injecting a differential current at the first stage output, which is generated by a calibration DAC. The combination of the three stages exhibits a a DC gain of 28 dB with an input capacitance of only a few fF.

# B. Main DAC and Calibrations

The non-binary main DAC of the SAR converter has been realized using a PMOS based implementation of the current steering architecture (Fig. 13). This scheme shows good performance stability with respect to PVT and a good rejection of supply noise. Current steering has been selected because of the need to drive the long interconnect line between the DAC output and the sampling front-end. The use of a switched capacitor DAC without buffering would result in a SAR converter gain highly dependent on the amount of parasitic capacitance at the DAC output. A current steering DAC instead, with its low output impedance, can drive large capacitive loads without suffering from the same problem.

Current steering simplifies significantly the problem of generation, distribution and correction of the references that define the gain of each of the 64 SAR ADCs. In fact, the full scale of a current steering DAC can be tuned by simply changing the current that is mirrored by the current sources. In this design, the reference currents for each SAR are generated centrally by a bias block and then distributed across the four converter arrays. By using currents (instead of voltages) the sensitivity to supply (ground) noise and parasitic coupling is significantly reduced. Locally to each SAR ADC, the reference current is tuned using



Fig. 13. Current-steering DAC implementation.



Fig. 14. Offset and gain calibrations.

a calibration DAC. The area of the DAC has been minimized by exploiting the redundancy embedded in the non-binary SAR algorithm. During the startup calibration, mismatches of the current sources are measured by the SAR ADC in a way similar to [8] and then used during operation to convert the non-binary decision output into a binary word. This approach requires only to measure the DAC weights and avoids the necessity for calibration or trimming [7] of the DAC current sources. Whereas in [8] an additional highly accurate algorithmic ADC is implemented to generate reference samples that are used to calibrate the SAR ADCs, in our work each SAR ADC generates its own reference signals by converting a zero signal. This is enabled by the less than two radix of the DAC in combination with accurate LSB sources. In more detail, since the sum of first four LSBs is larger than the value of the fifth LSB, the fifth LSB can be measured using the smaller LSBs. Once the fifth LSB is measured, the same procedure can be repeated to measure the sixth LSB, etc. The calculations involved in this measurement consists of simple addition and subtraction operations in combination with averaging, enabling a small and efficient on-chip digital calibration engine instead of the software approach of [8].

Offset and gain mismatches are calibrated in the analog domain using two calibration DACs and an externally provided voltage reference that is buffered on chip, as shown in Fig. 14. Similar to the main DAC, the current steering architecture is selected for its simplicity, performance stability with respect to PVT, as well as its resilience to supply and ground noise. Moreover, since current can be easily routed where it is needed, the calibration DACs can be moved away from the critical blocks (e.g., preamplifier) further simplifying the layout. The area of

the calibration DACs is minimized by scaling their elements according to a non-binary law. During startup offset and gain calibration, the proper control words for the calibration DACs are derived using the successive approximation algorithm by the calibration engine.

#### VII. MEASUREMENT RESULTS

The prototype is realized in a baseline 7-metal 65 nm CMOS process and is integrated as part of a DOCSIS 3.0 [39] direct sampling receiver prototype [1]. The IC includes a PLL, a digital multi-channel selector block, four analog IF outputs with DACs and digital outputs that are used to measure the ADC. It is packaged in an LGA132 package. Off-chip voltage and current references are received internally and are distributed to the blocks accordingly. Fig. 15 depicts the IC block diagram and the associated measurement setup. Measurements were performed using an external clock source, bypassing the on-chip PLL, feeding the ADC output data directly to the logic analyzer without going through the on-chip DSP. The output signal is decimated by five internally to overcome the data acquisition speed limitation of the setup. It should be noted that the decimation causes the content of the complete sampled spectrum to fold on 1/5th of the Nyquist frequency and allows measuring all spectral artifacts of the converter.

Fig. 16 shows a die photo of the converter, which occupies a total of 5.1 mm<sup>2</sup>. The middle part of the layout is occupied by the T/Hs, signal and clock routing and the CML clock generator and operates from a 1.3 V supply. All high frequency loops are therefore concentrated in the middle of the IC. The input signal is routed from the bottom of the ADC, whereas the clock



Fig. 15. Block diagram of the IC implementation and measurement setup.



Fig. 16. Die photo.

is coming from the left between quarters B and D. The calibration logic is wrapped around the QADC arrays and occupies 50% of the total area. The output data are synchronized at the top of the ADC prior being sent to the output. Deep Nwell and extensive de-coupling were used to reduce the impact of supply and substrate noise. The interface buffers and DACs use 1.6 V while the SAR and calibration logic use 1.2 V, leading to a total of 480 mW at 2.6 GS/s (excluding LVDS-like output buffers and external references). The calibration logic consumes only 36 mW.

Fig. 17 shows spectral plots with and without the calibration activated for a full scale 1.4  $\rm V_{pp-diff}$  input signal at  $f_{\rm in}=92$  MHz and  $f_s=2.6$  GS/s. The SNDR without calibration is 32.4 dB and improves to 52.8 dB with calibration. The total offset and gain mismatch tone power is approximately 10 dB below the total thermal noise level. The output power spectrum for an input signal at 1.25 GHz is shown in Fig. 18. Offset spurious tones remain at constant level compared to Fig. 17 at -75.4 dBFS, whereas timing skew (the three most dominant tones) as expected due to their linear dependency with signal frequency. The most dominant one is at -54 dBFS. The HD3 remains at -64.2 dBFS, whereas HD5 and HD7 remain below -66 dBFS (HD7 resides next to the large spurious tone at -60 dBFS, shown with

an arrow). Fig. 19 demonstrates the impact of randomization using the redundancy in the number of SAR ADC units in each QADC. The plot at the top of the figure shows the spectrum of one QADC with a pre-determined SAR ADC sequence, leading to a fixed pattern of offset and gain tones. The bottom plot shows how randomizing translates all offset and gain spurious tones into a noisy spectrum, and enhances further low spurious performance for the intended application at the cost of a minor increase of the noise floor (e.g., 0.1 dB). The measurements in this prototype support the theoretical results described in [40].

Fig. 20 shows measured data for SNDR, SFDR, -THD and SNR versus  $f_{\rm in}$  at  $f_s=2.6$  GS/s for a full scale 1.4  $\rm V_{pp-diff}$  input signal with a fixed SAR ADC sequence (no randomization). Up to the Nyquist frequency the THD remains better than -58 dB. A graceful degradation can be observed beyond the Nyquist frequency leading to a THD of -55 dB at 2 GHz, which is attributed to the bandwidth limitations of the buffer which has to handle the full bandwidth during the tracking phase. The SFDR at low frequencies is determined by the harmonic distortion of the signal source. In the decade between 200 MHz and 2 GHz timing skew sets the maximum spurious component, and beyond 2 GHz harmonic distortion takes over. A timing skew spread of 400 fs rms was estimated by measuring at room tem-



Fig. 17. Power spectrum plots with and without calibration for  $f_{\rm in}=92$  MHz,  $f_s=2.6$  GS/s.



Fig. 18. ADC spectrum with a 1.25 GHz input signal at 2.6 GS/s (decimated by 5).

perature and under the same conditions several ICs and ana-



Several measurements were performed to evaluate the performance of this ADC in a real life application, i.e., with the



Fig. 19. Power spectrum plots without (top) and with (bottom) randomization mode for  $f_{\rm in}=22$  MHz,  $f_s=2.6$  GS/s for one QADC.



Fig. 20. Performance versus input frequency sweep.

complete receiver operating with multi-stream internet and TV functionality. Measurements in such conditions, with 150 carriers imposing maximum digital activity and high temperatures, showed no noticeable degradation of the system performance or interference [1]. Table I summarizes the measured results of the prototype ADC.

#### VIII. SUMMARY AND CONCLUSIONS

A hierarchical time-interleaved SAR ADC architecture employing an open-loop buffer array operated in feedforward-sampling and feedback-SAR mode was presented. The hierarchy allows for the separate optimization of the T/H and SAR ADC functions, where four T/Hs optimized for speed, linearity, and timing accuracy drive 64 SAR ADCs. The pre-charge bootstrap enabled wide bandwidth at the GS/s range. The split of the sampling and DAC functions of the SAR operation allowed to place the buffer in a loop and effectively eliminated its linearity

TABLE I PERFORMANCE SUMMARY OF PROTOTYPE ADC

| Process             | 65nm                           |
|---------------------|--------------------------------|
| Resolution          | 10b                            |
| Active Area         | 5.1mm <sup>2</sup>             |
| Supply              | 1.2/1.3/1.6V                   |
| Input               | $1.4V_{pp-diff}$               |
| Sampling rate       | 2.6GS/s                        |
| Input termination   | 100Ohm-diff                    |
| 3dB input bandwidth | >5GHz                          |
| SNDR @ Nyq.         | 48.5dB                         |
| SFDR @ Nyq.         | 53.8dB                         |
| THD @ Nyq.          | <-58dB                         |
| SNR @ Nyq.          | >52dB                          |
| Jitter              | $<110 \text{fs}_{rms}$         |
| Power               | 480mW (excluding LVDS buffers) |

requirements, while the current mode DAC approach simplified the reference generation and distribution in the ADC array. These techniques enabled large input signal swing and low capacitor values, resulting to low power and high speed operation.

This ADC showed that massive interleaving of 10b SAR ADCs in advanced CMOS processes with low power consumption is possible without the penalties of traditional interleaving architectures. The achieved performance level allows to greatly simplify the design of receivers for cable applications by, essentially, mitigating all the requirements of traditional receiver's critical blocks with a single high speed ADC. This ADC employs simple differential stages without linearity requirements, does not require intrinsic matching and relies on a sampling switch and digital calibrations techniques, permitted by the integrating capabilities of deep-submicron technologies.

# ACKNOWLEDGMENT

The authors thank O. Jamin, F. Courtois, M. Kristen, and the BL TV Front-End team for inputs, top-level and packaging support, H. v. d. Ploeg and M. Vertregt for their contributions, Y. Tang for helping in layout, and L. Lo Coco and L. Warmerdam for their support.

# REFERENCES

- [1] E. Janssen *et al.*, "A direct sampling multi-channel receiver for DOCSIS 3.0 in 65 nm," in *Proc. IEEE Symp. VLSI Circuits*, 2011.
- [2] F. Gatta et al., "An embedded 65 nm CMOS baseband IQ 48 MHz–1 GHz dual tuner for DOCSIS 3.0," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3511–3525, Dec. 2009.
- [3] D. Drexelmayr, "A 6b 600 MHz 10 mW ADC array in digital 90 nm CMOS," in *IEEE ISSCC Dig.*, 2004, pp. 264–265.
- [4] P. Schvan et al., "A 24 GS/s 6b ADC in 90 nm CMOS," in IEEE ISSCC Dig., 2008, pp. 544–545.
- [5] D. Greshishchev *et al.*, "A 40 GS/s 6b ADC in 65 nm CMOS," in *IEEE ISSCC Dig.*, 2010, pp. 390–391.
  [6] S. M. Louwsma *et al.*, "A 1.35 GS/s, 10 b, 175 mW time-interleaved
- [6] S. M. Louwsma et al., "A 1.35 GS/s, 10 b, 175 mW time-interleaved AD converter in 0.13 μm CMOS," IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 778–786, Apr. 2008.
- [7] E. Alpman et al., "A 1.1 V 50 mW 2.5 GS/s 7b time-interleaved C-2C SAR ADC in 45 nm LP digital CMOS," in *IEEE ISSCC Dig.*, 2009, pp. 76–77, 77a.
- [8] W. Liu et al., "A 600 MS/s 30 mW 0.13 um CMOS ADC array achieving over 60 dB SFDR with adaptive digital equalization," in IEEE ISSCC Dig., 2009, pp. 82–83.
- [9] M. van Elzakker et al., "A 10-bit charge-redistribution ADC consuming 1.9 μW at 1 MS/s," *IEEE J. Solid-State Circuits*, vol. 45, no. 5, pp. 1007–1015, May 2010.
- [10] P. Harpe et al., "A 30 fJ/conversion-step 8b 0-to-10 MS/s asynchronous SAR ADC in 90 nm CMOS," in IEEE ISSCC Dig., 2010, pp. 387–389.

- [11] M. Hesener et al., "A 14b 40 MS/s redundant SAR ADC with 480 MHz clock in 0.13 μm CMOS," in IEEE ISSCC Dig., 2007, pp. 248–249.
- [12] W. Liu et al., "A 12b 22.5/45 MS/s 3.0 mW 0.059 mm<sup>2</sup> CMOS SAR ADC achieving over 90 dB SFDR," in *IEEE ISSCC Dig.*, 2010, pp. 380–381
- [13] C. C. Liu et al., "A 10b 100 MS/s 1.13 mW SAR ADC with binary-scaled error compensation," in IEEE ISSCC Dig., 2010, pp. 386–387.
- [14] ADC12D1x00 12-bit ADC Family. National Semiconductors website [Online]. Available: http://www.national.com/assets/en/other/national\_adc12d1x00\_product\_brief.pdf
- [15] K. Doris et al., "A 480 mW 2.6 GS/s 10b 65 nm time-interleaved ADC with 48.5 dB SNDR up to Nyquist," in IEEE ISSCC Dig., 2011, pp. 180–182.
- [16] N. Kurosawa et al., "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," *IEEE Trans. Circuits Syst. I, Fund. Theory Applicat.*, vol. 48, no. 3, pp. 261–271, Mar. 2001.
- [17] C. Vogel, "The impact of combined channel mismatch effects in timeinterleaved ADCs," *IEEE Trans. Instrum. Meas.*, vol. 54, no. 1, pp. 415–427, 2005.
- [18] A. Balakrishnan, "On the problem of time jitter in sampling," *IRE Trans. Inf. Theory*, vol. 8, no. 3, pp. 226–236, 1962.
- [19] K. Doris, "High-Speed D/A converters: From analysis and synthesis concepts to IC implementation," Ph.D. dissertation, Technical University Eindhoven, Eindhoven, The Netherlands, 2004.
- [20] K. Poulton et al., "A 20 GS/s 8b ADC with a 1 MB memory in 0.18 um CMOS," in IEEE ISSCC Dig., 2003, pp. 318–319.
- [21] R. C. Taft et al., "A 1.8 V 1.0 GS/s 10b self-calibrating unified-folding-interpolating ADC with 9.1 ENOB at Nyquist frequency," in *IEEE ISSCC Dig.*, 2009, pp. 77–78, 79a.
- [22] R. Payne *et al.*, "A 12b 1 GS/s SiGe BiCMOS two-way time-inter-leaved pipeline ADC," in *IEEE ISSCC Dig.*, 2011, pp. 182–184.
- [23] H. van de Vel et al., "A 1.2-V 250-mW 14-b 100-MS/s digitally calibrated pipeline ADC in 90-nm CMOS," IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1047–1056, Apr. 2009.
- [24] A. M. A. Ali et al., "A 16-bit 250-MS/s IF sampling pipelined ADC with background calibration," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2602–2612, Dec. 2010.
- [25] B. Verbruggen et al., "A 2.6 mW 6b 2.2 GS/s 4-times interleaved fully dynamic pipelined ADC in 40 nm digital CMOS," in *IEEE ISSCC Dig.*, 2010, pp. 296–297.
- [26] S. Gupta et al., "A 1 GS/s 11b time-interleaved ADC with 55-dB SNDR, 250 mW power realized by a high bandwidth scalable time-interleaved architecture," *IEEE J. Solid-State Circuits*, vol. 41, pp. 2650–2657, Dec. 2006.
- [27] A. Nazemi et al., "A 10.3 Gs/s 6 bit (5.1 ENOB at Nyquist) time-inter-leaved/pipelined ADC using open-loop amplifiers and digital calibration in 90 nm CMOS," in Proc. IEEE Symp. VLSI Circuits, 2008, pp. 18–19.
- [28] B. Murmann, "Low-power pipelined A/D conversion," in Proc. 20th Workshop on Advances in Analog Circuit Design (AACD), Apr. 2011
- [29] M. Gustavsson and N. N. Tan, "A global passive sampling technique for high-speed switched-capacitor time-interleaved ADCs," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 47, no. 9, pp. 821–831, Sep. 2000.
- [30] B. P. Ginsburg and A. P. Chandrakasan, "Highly interleaved 5-bit, 250-MSample/s, 1.2-mW ADC with redundant channels in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 43, no. 12, pp. 2641–2650, Dec. 2008.
- [31] A. M. Abo and P. R. Gray, "A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 34, pp. 599–606, May 1999.
- [32] C.-C. Hsu et al., "An 11b 800 MS/s time-interleaved ADC with digital background calibration," in *IEEE ISSCC Dig.*, 2007, pp. 464–465.
- [33] C.-C. Hsu *et al.*, "A 7b 1.1 GS/s reconfigurable time-interleaved ADC in 90 nm CMOS," in *Proc. IEEE Symp. VLSI Circuits*, 2007, pp. 66–67.
  [34] R. C. Taft *et al.*, "A 1.8-V 1.6-GSample/s 8-b self-calibrating folding
- [34] R. C. Taft et al., "A 1.8-V 1.6-GSample/s 8-b self-calibrating folding ADC with 7.26 ENOB at Nyquist frequency," *IEEE J. Solid-State Cir*cuits, vol. 39, no. 12, pp. 2107–2115, Dec. 2004.
- [35] X. Jiang and M.-C. F. Chang, "A 1-GHz signal bandwidth 6-bit CMOS ADC with power-efficient averaging," *IEEE J. Solid-State Circuits*, vol. 40, no. 2, pp. 532–535, Feb. 2005.
- [36] Z. Cao, S. Yan, and Y. Li, "A 32 mW 1.25 GS/s 6b 2b/step SAR ADC in 0.13 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 862–873, Mar. 2009.
- [37] B. Razavi, *Principles of Data Conversion System Design*. New York: IEEE Press, 1995, vol. 126.

- [38] T. Kobayashi *et al.*, "A current- controlled latch sense amplifier and a static power-saving input buffer for low-power architectures," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 523–527, Apr. 1993.
- [39] Data over cable service interface specifications DOCSIS3.0, Cable-Labs, Physical Layer Specification CM-SP-PHYv3.0-I09-101008.
- [40] J. Elbornsson, F. Gustafsson, and J.-E. Eklund, "Analysis of mismatch noise in randomly interleaved ADC system," in *Proc. IEEE ICASSP*, 2003, vol. 6, pp. 277–280.



Kostas Doris (M'04) was born in Thessaloniki, Greece, in 1973. He received the degree in physics in 1996 and the M.Sc. degree in radio-electronics in 1998, both from the Aristotle University of Thessaloniki, Greece. In 2004, he received the Ph.D. degree from the Technical University of Eindhoven, The Netherlands.

He joined Philips Research in 2003, and subsequently, in 2006, NXP Semiconductors. He is currently Senior Principal Scientist in the Central Research & Development Department of NXP,

heading the department of High-speed Data Acquisition. His area of interest includes high-speed/high-resolution data conversion systems and highly digitized tranceivers. He is the (co-)author of many papers and patents in the field of data converters, and the author of the book *Wide-Bandwidth High Dynamic Range D/A Converters* (Springer, 2006).



Erwin Janssen (M'11) was born in Ede, The Netherlands, in 1976. He received the M.Sc. degree (cum laude) in electrical engineering, with an additional degree in computer science, from the University of Twente, Enschede, The Netherlands, in 2001. He received the Ph.D. degree in electrical engineering from the Technical University of Eindhoven, The Netherlands, in 2010.

From 2001 to 2006 he was with Philips Research Laboratories, Eindhoven, The Netherlands, and worked on various aspects of signal processing

for digital audio. In 2006 he joined NXP Semiconductors, Eindhoven, The Netherlands, and since then he has specialized in mixed-signal IC design, with a special focus on calibration techniques for high speed data conversion systems. His other research interests include digital signal processing and sigma-delta modulation. He is the author of the book *Look-Ahead Based Sigma-Delta Modulation*.



**Claudio Nani** was born in Brescia, Italy, in 1983. He received the B.Sc. and M.Sc. degrees in electrical engineering from the University of Pisa, Italy, in 2005 and 2007, respectively.

From 2007 to 2010, he was with the Mixed Signal Circuits and Systems group of NXP Semiconductors Research, Eindhoven (The Netherlands) where he worked on high-speed ADC for cable tuners applications. Since 2010 he has been with Marvell, Pavia, Italy), working on high-performance data converters for telecommunication systems. His current research

interests include high-speed power-efficient ADCs, data converter digital calibration systems and mixed-signal design methodologies.



Athon Zanikopoulos (M'09) was born in Volos, Greece, in 1974. He received the degree in physics and the M.Sc. degree in electronics from Aristotle University of Thessaloniki, Greece, in 1998 and 2001, respectively. In 2002, he began working toward the Ph.D. degree at Eindhoven University of Technology, Eindhoven, The Netherlands, in the field of analog-to-digital converters. In 2007, he joined NXP Semiconductors, Eindhoven, The Netherlands, where he currently holds a Research Scientist position in the Central Research and

Development Department. His research interests include high-speed high-performance analog-to-digital and digital-to-analog conversion and reconfigurable analog/mixed-signal systems.



**Gerard van der Weide** was born in 't Harde, The Netherlands, on November 23, 1971. He received the M.Sc. degree in electrical engineering from Twente University of Technology, Enschede, The Netherlands, in 1995.

In 1995, he joined the Mixed-Signal Circuits and Systems group of Philips Research Laboratories, Eindhoven, The Netherlands, where he has been working on high-speed analog-to-digital converters and associated circuits. In 2006, he joined the research group of NXP Semiconductors, where he is

currently involved in the top-level integration of RF and mixed-signal ICs for wireless connectivity applications.